Skip to content

[FEAT] Add optimization.n_repeats for repeated benchmark runs#28

Draft
VincentG1234 wants to merge 1 commit into
mainfrom
FEAT/n-repeats-benchmark
Draft

[FEAT] Add optimization.n_repeats for repeated benchmark runs#28
VincentG1234 wants to merge 1 commit into
mainfrom
FEAT/n-repeats-benchmark

Conversation

@VincentG1234

Copy link
Copy Markdown
Collaborator

Summary

Add optimization.n_repeats (default 1) to run multiple GuideLLM benchmarks per Optuna trial configuration on the same vLLM server, report mean objectives to Optuna, and expose repeat spread in Optuna user attributes for log_metrics.

Why

Benchmark results are noisy. Repeating each configuration a few times (typically 2–3) reduces variance in the objective reported to the sampler, without changing the meaning of n_trials (still one unique config per trial).

What changed

  • auto_tune_vllm/core/config.py — add n_repeats: int = 1 and validation (>= 1)
  • auto_tune_vllm/execution/trial_controller.py — loop benchmark runs after a single vLLM startup; aggregate objectives/metrics by mean; store per-run values under detailed_metrics.repeats when n_repeats > 1; fail the whole trial if any repeat fails
  • auto_tune_vllm/core/study_controller.py — for log_metrics only, when n_repeats > 1, write metric_<name>, metric_<name>_rel_range, metric_<name>_values, and n_repeats as Optuna user attrs
  • docs/configuration.md — document n_repeats and repeat-related log_metrics attrs
  • examples/study_config.yaml — minimal smoke-test config (n_trials: 3, n_repeats: 2, baseline disabled)

How tested

  • ruff check .
  • pytest -v tests/ (60 passed)
  • Manual E2E (maintainer): auto-tune-vllm optimize --config examples/study_config.yaml, then inspect ./optuna_studies/n_repeats_smoke_test/study.db in Optuna Dashboard for metric_*_rel_range / metric_*_values

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant